{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QSMmdrrVLZ-N"
   },
   "source": [
    "# Data Visualization in Jupyter <a class=\"tocSkip\">\n",
    "\n",
    "A common use for notebooks is data visualization. The workspace makes this easy with several charting and visualization tools available pre-installed as Python imports.\n",
    "\n",
    "**In this notebook:**\n",
    "\n",
    "* [Matplotlib](#Matplotlib)\n",
    "* [Seaborn](#Seaborn)\n",
    "* [Altair](#Altair)\n",
    "* [Plotly](#Plotly)\n",
    "* [HoloViews](#HoloViews)\n",
    "* [Bqplot](#Bqplot)\n",
    "* [UMAP](#UMAP)\n",
    "* [Pandas-Profiling](#Pandas-Profiling)\n",
    "* [Missingno](#Missingno)\n",
    "* [SHAP](#SHAP)\n",
    "* [Pillow](#Pillow)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xNzEBRkzL3B0"
   },
   "source": [
    "## Matplotlib\n",
    "\n",
    "[Matplotlib](http://matplotlib.org/) is the most common charting package, see its [documentation](http://matplotlib.org/api/pyplot_api.html) for details, and its [examples](http://matplotlib.org/gallery.html#statistics) for inspiration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-11-16T20:38:33.726250Z",
     "start_time": "2018-11-16T20:38:33.721946Z"
    }
   },
   "source": [
    "### Activated Matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:32.108087Z",
     "start_time": "2020-01-26T10:51:32.088603Z"
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "plt.rcParams[\"figure.figsize\"] = (12,6)\n",
    "%config InlineBackend.figure_format='retina'  # adapt plots for retina displays"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "WALI8x49GUpe"
   },
   "source": [
    "### Line Plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:33.906355Z",
     "start_time": "2020-01-26T10:51:33.573811Z"
    },
    "colab": {},
    "colab_type": "code",
    "id": "08RTGn_xE3MP",
    "outputId": "908732cc-757d-46d8-ac75-5bdb5ba51cfe"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "x  = [1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
    "y1 = [1, 3, 5, 3, 1, 3, 5, 3, 1]\n",
    "y2 = [2, 4, 6, 4, 2, 4, 6, 4, 2]\n",
    "plt.plot(x, y1, label=\"line L\")\n",
    "plt.plot(x, y2, label=\"line H\")\n",
    "plt.plot()\n",
    "\n",
    "plt.xlabel(\"x axis\")\n",
    "plt.ylabel(\"y axis\")\n",
    "plt.title(\"Line Graph Example\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "sIZLTZ0pdo0Z"
   },
   "source": [
    "### Bar Plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:35.526724Z",
     "start_time": "2020-01-26T10:51:35.188804Z"
    },
    "colab": {},
    "colab_type": "code",
    "id": "bZv4MenQpYOF",
    "outputId": "ce9608d5-e7ea-4ac7-856f-a265b3d0eee9"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Look at index 4 and 6, which demonstrate overlapping cases.\n",
    "x1 = [1, 3, 4, 5, 6, 7, 9]\n",
    "y1 = [4, 7, 2, 4, 7, 8, 3]\n",
    "\n",
    "x2 = [2, 4, 6, 8, 10]\n",
    "y2 = [5, 6, 2, 6, 2]\n",
    "\n",
    "# Colors: https://matplotlib.org/api/colors_api.html\n",
    "\n",
    "plt.bar(x1, y1, label=\"Blue Bar\", color='b')\n",
    "plt.bar(x2, y2, label=\"Green Bar\", color='g')\n",
    "plt.plot()\n",
    "\n",
    "plt.xlabel(\"bar number\")\n",
    "plt.ylabel(\"bar height\")\n",
    "plt.title(\"Bar Chart Example\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YQO2Lw8Xdu7x"
   },
   "source": [
    "### Histograms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:39.246429Z",
     "start_time": "2020-01-26T10:51:36.525502Z"
    },
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "id": "SZ-DMbnPMbMY",
    "outputId": "8c1089d5-ecf0-492b-cf21-8f30ddc45862"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "# Use numpy to generate a bunch of random data in a bell curve around 5.\n",
    "n = 5 + np.random.randn(1000)\n",
    "\n",
    "m = [m for m in range(len(n))]\n",
    "plt.bar(m, n)\n",
    "plt.title(\"Raw Data\")\n",
    "plt.show()\n",
    "\n",
    "plt.hist(n, bins=20)\n",
    "plt.title(\"Histogram\")\n",
    "plt.show()\n",
    "\n",
    "plt.hist(n, cumulative=True, bins=20)\n",
    "plt.title(\"Cumulative Histogram\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9-CelVUmdz8r"
   },
   "source": [
    "### Scatter Plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:39.680376Z",
     "start_time": "2020-01-26T10:51:39.249965Z"
    },
    "colab": {},
    "colab_type": "code",
    "id": "79C7jc9mv-Ji",
    "outputId": "cf0c0e0e-8443-465e-f135-c518d818d486"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "x1 = [2, 3, 4]\n",
    "y1 = [5, 5, 5]\n",
    "\n",
    "x2 = [1, 2, 3, 4, 5]\n",
    "y2 = [2, 3, 2, 3, 4]\n",
    "y3 = [6, 8, 7, 8, 7]\n",
    "\n",
    "# Markers: https://matplotlib.org/api/markers_api.html\n",
    "\n",
    "plt.scatter(x1, y1)\n",
    "plt.scatter(x2, y2, marker='v', color='r')\n",
    "plt.scatter(x2, y3, marker='^', color='m')\n",
    "plt.title('Scatter Plot Example')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "C0LOohpqeCjx"
   },
   "source": [
    "### Pie Charts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:42.480383Z",
     "start_time": "2020-01-26T10:51:42.252756Z"
    },
    "colab": {},
    "colab_type": "code",
    "id": "ZdEG-d4g4U6v",
    "outputId": "51f53730-aa0a-4a10-d5f0-ed4293dc832b"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "labels = 'S1', 'S2', 'S3'\n",
    "sections = [56, 66, 24]\n",
    "colors = ['c', 'g', 'y']\n",
    "\n",
    "plt.pie(sections, labels=labels, colors=colors,\n",
    "        startangle=90,\n",
    "        explode = (0, 0.1, 0),\n",
    "        autopct = '%1.2f%%')\n",
    "\n",
    "plt.axis('equal') # Try commenting this out.\n",
    "plt.title('Pie Chart Example')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "sX97x87MTyIf"
   },
   "source": [
    "### fill_between and alpha"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:44.800431Z",
     "start_time": "2020-01-26T10:51:44.473062Z"
    },
    "colab": {},
    "colab_type": "code",
    "id": "BCUl8mTMT4sN",
    "outputId": "f52ac738-a80d-4d4a-b112-e3acb9334eed"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "ys = 200 + np.random.randn(100)\n",
    "x = [x for x in range(len(ys))]\n",
    "\n",
    "plt.plot(x, ys, '-')\n",
    "plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6)\n",
    "\n",
    "plt.title(\"Fills and Alpha Example\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jONspxyzeT4Y"
   },
   "source": [
    "### Subplotting using Subplot2grid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:47.205860Z",
     "start_time": "2020-01-26T10:51:46.411776Z"
    },
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 401
    },
    "colab_type": "code",
    "id": "JF-dVGj3ExQm",
    "outputId": "df00765b-d58a-4ad0-ce52-b1670a86f486"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "def random_plots():\n",
    "  xs = []\n",
    "  ys = []\n",
    "  \n",
    "  for i in range(20):\n",
    "    x = i\n",
    "    y = np.random.randint(10)\n",
    "    \n",
    "    xs.append(x)\n",
    "    ys.append(y)\n",
    "  \n",
    "  return xs, ys\n",
    "\n",
    "fig = plt.figure()\n",
    "ax1 = plt.subplot2grid((5, 2), (0, 0), rowspan=1, colspan=2)\n",
    "ax2 = plt.subplot2grid((5, 2), (1, 0), rowspan=3, colspan=2)\n",
    "ax3 = plt.subplot2grid((5, 2), (4, 0), rowspan=1, colspan=1)\n",
    "ax4 = plt.subplot2grid((5, 2), (4, 1), rowspan=1, colspan=1)\n",
    "\n",
    "x, y = random_plots()\n",
    "ax1.plot(x, y)\n",
    "\n",
    "x, y = random_plots()\n",
    "ax2.plot(x, y)\n",
    "\n",
    "x, y = random_plots()\n",
    "ax3.plot(x, y)\n",
    "\n",
    "x, y = random_plots()\n",
    "ax4.plot(x, y)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fllyxKu8edm6"
   },
   "source": [
    "### Matplotlib styles\n",
    "\n",
    "To customize styling further please see the [matplotlib docs](https://matplotlib.org/users/style_sheets.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cRkuifc7PazR"
   },
   "source": [
    "## Seaborn\n",
    "\n",
    "There are several libraries layered on top of Matplotlib that you can use in the workspace. One that is worth highlighting is [Seaborn](http://seaborn.pydata.org). Here's a Seaborn [heatmap](https://seaborn.pydata.org/generated/seaborn.heatmap.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:51:50.481758Z",
     "start_time": "2020-01-26T10:51:49.316432Z"
    },
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "id": "Fjw7UfGZQcL9",
    "outputId": "db8cf44f-dd21-491f-ef2d-e504d8ee7fe1"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "\n",
    "# Make a 10 x 10 heatmap of some random data\n",
    "side_length = 10\n",
    "# Start with a 10 x 10 matrix with values randomized around 5\n",
    "data = 5 + np.random.randn(side_length, side_length)\n",
    "# The next two lines make the values larger as we get closer to (9, 9)\n",
    "data += np.arange(side_length)\n",
    "data += np.reshape(np.arange(side_length), (side_length, 1))\n",
    "# Generate the heatmap\n",
    "sns.heatmap(data)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "beTgCbVa_wFA"
   },
   "source": [
    "## Altair\n",
    "\n",
    "\n",
    "[Altair](https://altair-viz.github.io/index.html) is a declarative statistical visualization library for Python, based on Vega and Vega-Lite. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T11:09:46.119689Z",
     "start_time": "2020-01-26T11:09:42.316463Z"
    }
   },
   "outputs": [],
   "source": [
    "from altair_widgets import interact_with\n",
    "interact_with(source)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T11:06:31.942233Z",
     "start_time": "2020-01-26T11:06:31.369721Z"
    }
   },
   "outputs": [],
   "source": [
    "import altair as alt\n",
    "from vega_datasets import data\n",
    "\n",
    "source = data.cars()\n",
    "\n",
    "brush = alt.selection(type='interval')\n",
    "\n",
    "points = alt.Chart(source).mark_point().encode(\n",
    "    x='Horsepower',\n",
    "    y='Miles_per_Gallon',\n",
    "    color=alt.condition(brush, 'Origin', alt.value('lightgray'))\n",
    ").add_selection(\n",
    "    brush\n",
    ")\n",
    "\n",
    "bars = alt.Chart(source).mark_bar().encode(\n",
    "    y='Origin',\n",
    "    color='Origin',\n",
    "    x='count(Origin)'\n",
    ").transform_filter(\n",
    "    brush\n",
    ")\n",
    "\n",
    "points & bars"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotly\n",
    "\n",
    "[Plotly](https://github.com/plotly/plotly.py) is an interactive, open-source, and browser-based graphing library for Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:52:18.507655Z",
     "start_time": "2020-01-26T10:52:13.616307Z"
    }
   },
   "outputs": [],
   "source": [
    "import plotly\n",
    "plotly.offline.init_notebook_mode(connected=False)\n",
    "from plotly.graph_objs import *\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Create dataframe with random data\n",
    "df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))\n",
    "\n",
    "data = [Bar(x=df.A,\n",
    "            y=df.B)]\n",
    "\n",
    "plotly.offline.iplot(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:52:21.331419Z",
     "start_time": "2020-01-26T10:52:20.688433Z"
    }
   },
   "outputs": [],
   "source": [
    "import plotly\n",
    "plotly.offline.init_notebook_mode()\n",
    "import numpy as np\n",
    "from plotly.offline import iplot\n",
    "from plotly.graph_objs import *\n",
    "\n",
    "x = np.random.randn(2000)\n",
    "y = np.random.randn(2000)\n",
    "iplot([Histogram2dContour(x=x, y=y, contours=histogram2dcontour.Contours(coloring='heatmap')),\n",
    "       Scatter(x=x, y=y, mode='markers', marker=scatter.Marker(color='white', size=3, opacity=0.3))], show_link=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HoloViews\n",
    "\n",
    "\n",
    "[HoloViews](https://github.com/ioam/holoviews) lets you annotate your data and let it visualize itself. It helps you to express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:57:14.426481Z",
     "start_time": "2020-01-26T10:57:12.928570Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd \n",
    "import numpy as np\n",
    "import holoviews as hv\n",
    "hv.extension('plotly')\n",
    "\n",
    "# Load sample dataset\n",
    "from sklearn.datasets import load_iris\n",
    "iris = load_iris()\n",
    "iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],\n",
    "                 columns= list(iris['feature_names']) + ['species'])\n",
    "\n",
    "# Declaring Data\n",
    "from holoviews.operation import gridmatrix\n",
    "\n",
    "ds = hv.Dataset(iris_df)\n",
    "\n",
    "grouped_by_species = ds.groupby('species', container_type=hv.NdOverlay)\n",
    "grid = gridmatrix(grouped_by_species, diagonal_type=hv.Scatter)\n",
    "\n",
    "# Plot\n",
    "grid.options('Scatter', bgcolor='#efe8e2', size=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bqplot\n",
    "\n",
    "[Bqplot](https://github.com/bloomberg/bqplot) is a 2-D visualization system for Jupyter, based on the constructs of the Grammar of Graphics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:52:38.566304Z",
     "start_time": "2020-01-26T10:52:38.362458Z"
    },
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 401
    },
    "colab_type": "code",
    "id": "60f2SO4jIfQz",
    "outputId": "3600f6d7-501d-4144-9c99-6b8f807b0d5c"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from bqplot import pyplot as plt\n",
    "\n",
    "plt.figure(1, title='Line Chart')\n",
    "np.random.seed(0)\n",
    "n = 200\n",
    "x = np.linspace(0.0, 10.0, n)\n",
    "y = np.cumsum(np.random.randn(n))\n",
    "plt.plot(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## UMAP\n",
    "\n",
    "Uniform Manifold Approximation and Projection ([UMAP](https://github.com/lmcinnes/umap)) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:52:58.055211Z",
     "start_time": "2020-01-26T10:52:42.654334Z"
    }
   },
   "outputs": [],
   "source": [
    "import umap\n",
    "from sklearn.datasets import load_digits\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "digits = load_digits()\n",
    "\n",
    "embedding = umap.UMAP().fit_transform(digits.data)\n",
    "embedding\n",
    "\n",
    "plt.scatter(embedding[:, 0], embedding[:, 1], c=digits.target, cmap='Spectral', s=5)\n",
    "plt.gca().set_aspect('equal', 'datalim')\n",
    "plt.colorbar(boundaries=np.arange(11)-0.5).set_ticks(np.arange(10))\n",
    "plt.title('UMAP projection of the Digits dataset', fontsize=24);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas-Profiling\n",
    "\n",
    "[pandas-profiling](https://github.com/pandas-profiling/pandas-profiling) creates HTML profiling reports from pandas `DataFrame` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:53:39.770985Z",
     "start_time": "2020-01-26T10:53:31.657066Z"
    }
   },
   "outputs": [],
   "source": [
    "# Load sample dataset\n",
    "import pandas as pd\n",
    "meteorite_df = pd.read_csv('https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv')\n",
    "meteorite_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:42.204598Z",
     "start_time": "2020-01-26T10:53:39.776008Z"
    }
   },
   "outputs": [],
   "source": [
    "# Load pandas_profiling\n",
    "import pandas_profiling\n",
    "\n",
    "# there are some irrelevant warning you might want to filter\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "# Generate report for dataframe\n",
    "pandas_profiling.ProfileReport(meteorite_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Missingno\n",
    "\n",
    "[Missingno](https://github.com/ResidentMario/missingno) provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:50.769681Z",
     "start_time": "2020-01-26T10:55:42.208568Z"
    }
   },
   "outputs": [],
   "source": [
    "# Load missingo\n",
    "import missingno as msno\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "# Load sample dataset\n",
    "import pandas as pd\n",
    "meteorite_df = pd.read_csv('https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv')\n",
    "meteorite_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Matrix\n",
    "\n",
    "The `msno.matrix` nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:52.357089Z",
     "start_time": "2020-01-26T10:55:50.773163Z"
    }
   },
   "outputs": [],
   "source": [
    "msno.matrix(meteorite_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Heatmap\n",
    "\n",
    "The `msno.heatmap` measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:53.404636Z",
     "start_time": "2020-01-26T10:55:52.361015Z"
    }
   },
   "outputs": [],
   "source": [
    "msno.heatmap(meteorite_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bar Chart\n",
    "\n",
    "The `msno.bar` is a simple visualization of nullity by column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:54.679335Z",
     "start_time": "2020-01-26T10:55:53.407124Z"
    }
   },
   "outputs": [],
   "source": [
    "msno.bar(meteorite_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dendrogram\n",
    "\n",
    "The `msno.dendrogram` allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:55.462536Z",
     "start_time": "2020-01-26T10:55:54.683684Z"
    }
   },
   "outputs": [],
   "source": [
    "msno.dendrogram(meteorite_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SHAP\n",
    "[SHAP](https://github.com/slundberg/shap) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods and representing the only possible consistent and locally accurate additive feature attribution method based on expectations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:57.880822Z",
     "start_time": "2020-01-26T10:55:55.466742Z"
    }
   },
   "outputs": [],
   "source": [
    "# Load SHAP\n",
    "import shap\n",
    "shap.initjs() # load JS visualization code to notebook\n",
    "\n",
    "# train XGBoost model\n",
    "import xgboost\n",
    "X,y = shap.datasets.boston()\n",
    "model = xgboost.train({\"learning_rate\": 0.01}, xgboost.DMatrix(X, label=y), 100)\n",
    "\n",
    "# explain the model's predictions using SHAP values\n",
    "# (same syntax works for LightGBM, CatBoost, and scikit-learn models)\n",
    "explainer = shap.TreeExplainer(model)\n",
    "shap_values = explainer.shap_values(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:57.898520Z",
     "start_time": "2020-01-26T10:55:57.886554Z"
    }
   },
   "outputs": [],
   "source": [
    "# visualize the first prediction's explanation\n",
    "shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:58.229616Z",
     "start_time": "2020-01-26T10:55:57.902849Z"
    }
   },
   "outputs": [],
   "source": [
    "# visualize the training set predictions\n",
    "shap.force_plot(explainer.expected_value, shap_values, X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:59.074554Z",
     "start_time": "2020-01-26T10:55:58.232525Z"
    }
   },
   "outputs": [],
   "source": [
    "# create a SHAP dependence plot to show the effect of a single feature across the whole dataset\n",
    "shap.dependence_plot(\"RM\", shap_values, X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:55:59.768581Z",
     "start_time": "2020-01-26T10:55:59.079923Z"
    }
   },
   "outputs": [],
   "source": [
    "# summarize the effects of all the features\n",
    "shap.summary_plot(shap_values, X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:56:00.160437Z",
     "start_time": "2020-01-26T10:55:59.773815Z"
    }
   },
   "outputs": [],
   "source": [
    "shap.summary_plot(shap_values, X, plot_type=\"bar\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pillow\n",
    "\n",
    "[Pillow](https://pillow.readthedocs.io/en/5.3.x/index.html) adds image processing capabilities to your Python interpreter. This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities. You can use it as a very fast and simple way to visualize `numpy` arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-26T10:56:00.279341Z",
     "start_time": "2020-01-26T10:56:00.164891Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "def display_image(x):\n",
    "    x_scaled = np.uint8(255 * (x - x.min()) / (x.max() - x.min()))\n",
    "    return Image.fromarray(x_scaled)\n",
    "\n",
    "display_image(np.random.rand(200,200))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [Rich Output in Jupyter](./jupyter-basics-tutorial.ipynb#Rich-Output): Learn how you can use the Jupyter display system to incorporate a broad range of content into your Notebooks.\n",
    "- [Jupyter Tipps & Tricks](./jupyter-tipps.ipynb): Explore some amazing functionalities that you can use with Jupyter within the workspace.\n",
    "- [Introduction to Numpy](./numpy-tutorial.ipynb): Introduction to datatypes, arrays, and mathematical operations of the numpy library.\n",
    "- [Introduction to Pandas](./pandas-tutorial.ipynb): Introduction to the data structures and functionalities of the pandas library."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "Charts in Colaboratory",
   "provenance": [],
   "toc_visible": true,
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
